Large-Scale Experiments with NP Chunking of Polish

نویسندگان

Adam Radziszewski

Adam Pawlaczek

چکیده

The published experiments with shallow parsing for Slavic languages are characterised with small size of the corpora used. With the publication of the National Corpus of Polish (NCP), a new opportunity was opened: to test several chunking algorithms on the 1-million token manually annotated subcorpus of the NCP. We test three Machine Learning techniques: Decision Tree induction, Memory-Based Learning and Conditional Random Fields. We also investigate the influence of tagging errors on the overall chunker performance, which happens to be

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments in Base-NP Chunking and Its Role in Dependency Parsing for Thai

This paper studies the role of base-NP information in dependency parsing for Thai. The baseline performance reveals that the base-NP chunking task for Thai is much more difficult than those of some languages (like English). The results show that the parsing performance can be improved (from 60.30% to 63.74%) with the use of base-NP chunk information, although the best chunker is still far from ...

متن کامل

An Empirical Study of Vietnamese Noun Phrase Chunking with Discriminative Sequence Models

This paper presents an empirical work for Vietnamese NP chunking task. We show how to build an annotation corpus of NP chunking and how discriminative sequence models are trained using the corpus. Experiment results using 5 fold cross validation test show that discriminative sequence learning are well suitable for Vietnamese chunking. In addition, by empirical experiments we show that the part ...

متن کامل

Fast NP Chunking Using Memory-Based Learning Techniques

In this paper we discuss the application of Memory-Based Learning (MBL) to fast NP chunking. We first discuss the application of a fast decision tree variant of MBL (IGTree) on the dataset described in (Ramshaw and Marcus, 1995), which consists of roughly 50,000 test and 200,000 train items. In a second series of experiments we used an architecture of two cascaded IGTrees. In the second level o...

متن کامل

Proceedings of CoNLL - 99 , Bergen , Norway pp 53 - 60 Memory � Based Shallow Parsing

We present a memory based learning MBL approach to shallow parsing in which POS tagging chunking and identi cation of syntactic relations are formulated as memory based modules The experiments reported in this paper show competitive results the F for the Wall Street Journal WSJ treebank is for NP chunking for VP chunking for subject detection and for object detection

متن کامل

NP Alignment in Bilingual Corpora

We created a simple gold standard for English-Hungarian NP-level alignment, Orwell’s 1984, (since this already exists in manually verified POS-tagged format in many languages thanks to the Multex and MultexEast project) by manually verifying the automaticaly generated NP chunking (we used the yamcha, mallet and hunchunk taggers) and manually aligning the maximal NPs and PPs. The maximum NP chun...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Large-Scale Experiments with NP Chunking of Polish

نویسندگان

چکیده

منابع مشابه

Experiments in Base-NP Chunking and Its Role in Dependency Parsing for Thai

An Empirical Study of Vietnamese Noun Phrase Chunking with Discriminative Sequence Models

Fast NP Chunking Using Memory-Based Learning Techniques

Proceedings of CoNLL - 99 , Bergen , Norway pp 53 - 60 Memory � Based Shallow Parsing

NP Alignment in Bilingual Corpora

عنوان ژورنال:

اشتراک گذاری